Importing necessary libraries¶

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

import plotly.io as pio
pio.renderers.default = "notebook" 

import warnings
import sqlite3
from scipy.stats import ttest_ind
import scipy.stats as stats
warnings.filterwarnings('ignore')

Loading Dataset¶

In [2]:
# Creating connection 
conn = sqlite3.connect('inventory.db')

# fetching vendor summary data
df = pd.read_sql_query("select * from vendor_sales_summary", conn)
df.head()
Out[2]:
VendorNumber VendorName Brand Description PurchasePrice ActualPrice Volume TotalPurchaseQuantity TotalPurchaseDollars TotalSalesQuantity TotalSalesDollars TotalSalesPrice TotalExciseTax FreightCost GrossProfit ProfitMargin StockTurnover SalesPurchaseRatio
0 2 IRA GOLDMAN AND WILLIAMS, LLP 90085 Ch Lilian 09 Ladouys St Este 23.86 36.99 750.0 8 190.88 18.0 665.82 295.92 2.00 27.08 474.94 71.331591 2.250000 3.488160
1 2 IRA GOLDMAN AND WILLIAMS, LLP 90609 Flavor Essence Variety 5 Pak 17.00 24.99 162.5 320 5440.00 24.0 599.76 449.82 0.52 27.08 -4840.24 -807.029478 0.075000 0.110250
2 54 AAPER ALCOHOL & CHEMICAL CO 990 Ethyl Alcohol 200 Proof 105.07 134.49 3750.0 1 105.07 0.0 0.00 0.00 0.00 0.48 -105.07 -inf 0.000000 0.000000
3 60 ADAMBA IMPORTS INTL INC 771 Bak's Krupnik Honey Liqueur 11.44 14.99 750.0 39 446.16 47.0 704.53 494.67 37.01 367.52 258.37 36.672675 1.205128 1.579097
4 60 ADAMBA IMPORTS INTL INC 3401 Vesica Vodka 11.10 14.99 1750.0 6 66.60 0.0 0.00 0.00 0.00 367.52 -66.60 -inf 0.000000 0.000000

Exploratory Data Analysis¶

  • Previously we examined the various tables in the database to identify key variables, understand their relationships and determine which one should be included in the final analysis.

  • In this phase of EDA we'll analyse the resultant table to gain insights into the distribution of each column. This will help us understand data patterns, identify anomalies and ensure data quality defore proceeding with further analysis.

In [3]:
# Summary statistics
df.describe().T
Out[3]:
count mean std min 25% 50% 75% max
VendorNumber 10692.0 1.065065e+04 18753.519148 2.00 3951.000000 7153.000000 9552.000000 2.013590e+05
Brand 10692.0 1.803923e+04 12662.187074 58.00 5793.500000 18761.500000 25514.250000 9.063100e+04
PurchasePrice 10692.0 2.438530e+01 109.269375 0.36 6.840000 10.455000 19.482500 5.681810e+03
ActualPrice 10692.0 3.564367e+01 148.246016 0.49 10.990000 15.990000 28.990000 7.499990e+03
Volume 10692.0 8.473605e+02 664.309212 50.00 750.000000 750.000000 750.000000 2.000000e+04
TotalPurchaseQuantity 10692.0 3.140887e+03 11095.086769 1.00 36.000000 262.000000 1975.750000 3.376600e+05
TotalPurchaseDollars 10692.0 3.010669e+04 123067.799627 0.71 453.457500 3655.465000 20738.245000 3.811252e+06
TotalSalesQuantity 10692.0 3.077482e+03 10952.851391 0.00 33.000000 261.000000 1929.250000 3.349390e+05
TotalSalesDollars 10692.0 4.223907e+04 167655.265984 0.00 729.220000 5298.045000 28396.915000 5.101920e+06
TotalSalesPrice 10692.0 1.879378e+04 44952.773386 0.00 289.710000 2857.800000 16059.562500 6.728193e+05
TotalExciseTax 10692.0 1.774226e+03 10975.582240 0.00 4.800000 46.570000 418.650000 3.682428e+05
FreightCost 10692.0 6.143376e+04 60938.458032 0.09 14069.870000 50293.620000 79528.990000 2.570321e+05
GrossProfit 10692.0 1.213238e+04 46224.337964 -52002.78 52.920000 1399.640000 8660.200000 1.290668e+06
ProfitMargin 10692.0 -inf NaN -inf 13.324515 30.405457 39.956135 9.971666e+01
StockTurnover 10692.0 1.706793e+00 6.020460 0.00 0.807229 0.981529 1.039342 2.745000e+02
SalesPurchaseRatio 10692.0 2.504390e+00 8.459067 0.00 1.153729 1.436894 1.665449 3.529286e+02
In [4]:
# Select numeric columns only
numeric_cols = df.select_dtypes(include='number').columns.tolist()

# Create 4x4 subplots
fig = make_subplots(
    rows=4, cols=4, 
    subplot_titles=numeric_cols)

# Add histogram for each numeric column
for i, col in enumerate(numeric_cols):
    row = i // 4 + 1
    col_pos = i % 4 + 1
    fig.add_trace(
        go.Histogram(x=df[col], nbinsx=50, name=col),
        row=row, col=col_pos)

# Update layout
fig.update_layout(
    height=1200, width=1200,
    title_text="4x4 Numeric Plot Matrix",
    showlegend=False)

fig.show()
In [5]:
# Outliers detection with Boxplot
# Create 4x4 subplots
fig = make_subplots(
    rows=4, cols=4, 
    subplot_titles=numeric_cols)

# Add box plot for each numeric column
for i, col in enumerate(numeric_cols):
    row = i // 4 + 1
    col_pos = i % 4 + 1
    fig.add_trace(
        go.Box(y=df[col], name=col, boxpoints='outliers'),
        row=row, col=col_pos)

# Update layout
fig.update_layout(
    height=1200, width=1200,
    title_text="4x4 Box Plot Matrix",
    showlegend=False)

fig.show()

Summary Statistics Insights:¶

Negative and Zero Values:¶

  • Gross Profit : Minimumvalue is -52,002.78 , indicating losses. Some products or transections may be sellling at a loss due to high costs or selling at discounts lower than the purchase price.
  • Profit Margin : It has a minimun of -infinity , which suggests cases where revenue is Zero or even lower than costs.
  • Total Sales Quantuty & Dollars : Minimum values are Zero, meaning some products were purchased but never sold. These could be clow moving or obsolete stock.

Outliers indicated by high Standard Deviations :¶

  • Purchase & Actual Price : The max values (5,681.81 & 7,499.99) are significantly higher than the mean(24.39 & 35.64), indicating potential premium product.
  • Freight Cost : Huge Variation, from 0.09 to 2,57,032.07, suggests logistics inefficiencies or bulk shipments.
  • Stock Turnover : Ranges from 0 to 274.5, implying some product were sold extremely fast while others remain in stock indefinitely. Value more than 1 sold quantity for that product is higher than purchased quantity due to either sales are being fulfilled from older stock.
In [6]:
# filtering the data by removing inconsistencies
df = pd.read_sql_query("""
SELECT *
FROM vendor_sales_summary
WHERE GrossProfit > 0
AND ProfitMargin > 0
AND TotalSalesQuantity > 0""", conn)
df.head()
Out[6]:
VendorNumber VendorName Brand Description PurchasePrice ActualPrice Volume TotalPurchaseQuantity TotalPurchaseDollars TotalSalesQuantity TotalSalesDollars TotalSalesPrice TotalExciseTax FreightCost GrossProfit ProfitMargin StockTurnover SalesPurchaseRatio
0 2 IRA GOLDMAN AND WILLIAMS, LLP 90085 Ch Lilian 09 Ladouys St Este 23.86 36.99 750.0 8 190.88 18.0 665.82 295.92 2.00 27.08 474.94 71.331591 2.250000 3.488160
1 60 ADAMBA IMPORTS INTL INC 771 Bak's Krupnik Honey Liqueur 11.44 14.99 750.0 39 446.16 47.0 704.53 494.67 37.01 367.52 258.37 36.672675 1.205128 1.579097
2 105 ALTAMAR BRANDS LLC 2529 Right Gin 23.25 29.99 750.0 12 279.00 12.0 359.88 59.98 9.44 62.39 80.88 22.474158 1.000000 1.289892
3 105 ALTAMAR BRANDS LLC 8412 Tequila Ocho Plata Fresno 35.71 49.99 750.0 320 11427.20 307.0 15346.93 12947.41 242.15 62.39 3919.73 25.540808 0.959375 1.343018
4 200 AMERICAN SPIRITS EXCHANGE 20789 Zin-phomaniac Znfdl 9.73 14.99 750.0 96 934.08 84.0 1511.16 287.84 9.43 6.19 577.08 38.187882 0.875000 1.617806
In [7]:
# Select numeric columns only
numeric_cols = df.select_dtypes(include='number').columns.tolist()

# Create 4x4 subplots
fig = make_subplots(
    rows=4, cols=4, 
    subplot_titles=numeric_cols)

# Add histogram for each numeric column
for i, col in enumerate(numeric_cols):
    row = i // 4 + 1
    col_pos = i % 4 + 1
    fig.add_trace(
        go.Histogram(x=df[col], nbinsx=50, name=col),
        row=row, col=col_pos)

# Update layout
fig.update_layout(
    height=1200, width=1200,
    title_text="4x4 Numeric Plot Matrix",
    showlegend=False)

fig.show()
In [8]:
df.columns
Out[8]:
Index(['VendorNumber', 'VendorName', 'Brand', 'Description', 'PurchasePrice',
       'ActualPrice', 'Volume', 'TotalPurchaseQuantity',
       'TotalPurchaseDollars', 'TotalSalesQuantity', 'TotalSalesDollars',
       'TotalSalesPrice', 'TotalExciseTax', 'FreightCost', 'GrossProfit',
       'ProfitMargin', 'StockTurnover', 'SalesPurchaseRatio'],
      dtype='object')
In [9]:
# Plotting categorical columns 
category_cols= ['VendorName', 'Description']

# Create subplots
fig = make_subplots(
    rows=1, cols=2, 
    subplot_titles=category_cols)

vendor_counts= df['VendorName'].value_counts().head(10)
desc_counts= df['VendorName'].value_counts().head(10)

# Add VendorName bar chart
fig.add_trace(
    go.Bar(x=vendor_counts.index, y=vendor_counts.values, name="VendorName"),
    row=1, col=1
)

# Add Description bar chart
fig.add_trace(
    go.Bar(x=desc_counts.index, y=desc_counts.values, name="Description"),
    row=1, col=2
)

# bar chart for top 10 VendorName & Description
fig.update_layout(height=800, width=800)
In [10]:
# Correlation Heatmap
corr_matrix = df[numeric_cols].corr()

# plot heatmap
fig= px.imshow(
    corr_matrix,
    text_auto=True,
    color_continuous_scale="RdBu_r",
    title="Correlation Heatmap of Numeric Features"
)

fig.update_layout(height=800, width=800)
fig.show()

Correlation Insights¶

  • Purchase Price has weak correlation with ToatalSalesDollars & GrossProfit. That indicates price variation do not significantly impact sales revenue or profit.
  • Strong correlation between total purchase quantity and total sales quantity(.999), which confirms efficient inventory turnover.
  • The

Data Analysis¶

  • Identify Brands that needs promotional or pricing adjustments which exhibits lower sales performance but higher profit mergin.
In [11]:
brand_performance= df.groupby('Description').agg({
    'TotalSalesDollars': 'sum',
    'ProfitMargin' : 'mean'
}).reset_index()
In [12]:
low_sales_threshold= brand_performance['TotalSalesDollars'].quantile(0.15)
high_profit_threshold= brand_performance['ProfitMargin'].quantile(0.85)
In [13]:
# Filtering brands with low sales but high profit margin
target_brands = brand_performance[
    (brand_performance['TotalSalesDollars'] <= low_sales_threshold) &
    (brand_performance['ProfitMargin'] >= high_profit_threshold)
]
print('Brands with Low sales but High profit Margins:')
display(target_brands.sort_values('TotalSalesDollars'))
Brands with Low sales but High profit Margins:
Description TotalSalesDollars ProfitMargin
6199 Santa Rita Organic Svgn Bl 9.99 66.466466
2369 Debauchery Pnt Nr 11.58 65.975820
2070 Concannon Glen Ellen Wh Zin 15.95 83.448276
2188 Crown Royal Apple 27.86 89.806174
6237 Sauza Sprklg Wild Berry Marg 27.96 82.153076
... ... ... ...
5074 Nanbu Bijin Southern Beauty 535.68 76.747312
2271 Dad's Hat Rye Whiskey 538.89 81.851584
57 A Bichot Clos Marechaudes 539.94 67.740860
6245 Sbragia Home Ranch Merlot 549.75 66.444748
3326 Goulee Cos d'Estournel 10 558.87 69.434752

198 rows × 3 columns

In [14]:
fig = px.scatter(
    target_brands,
    x="TotalSalesDollars",
    y="ProfitMargin",
    size="TotalSalesDollars",
    color="ProfitMargin",
    hover_data=["Description"],   # show labels directly
    title="Target Brands: Profit Margin vs Total Sales"
)

# Add threshold lines
fig.add_vline(x=low_sales_threshold, line_dash="dash", line_color="red",
              annotation_text="Sales Threshold", annotation_position="top left")

fig.add_hline(y=high_profit_threshold, line_dash="dash", line_color="blue",
              annotation_text="Profit Threshold", annotation_position="bottom right")

fig.update_layout(height= 600, width= 800)
fig.show()  

Which vendors and brands demonstrates the highest sales performance

In [15]:
def format_dollar(value):
    if value >= 1000000:
        return f'{value / 1000000:.2f}M'
    elif value >= 1000:
        return f'{value/1000:.2f}K'
    else:
        return str(value)
In [16]:
# Top vendors and brands by Performance
top_vendors= df.groupby('VendorName')['TotalSalesDollars'].sum().nlargest(10)
top_brands= df.groupby('Description')['TotalSalesDollars'].sum().nlargest(10)
top_vendors.apply(lambda x : format_dollar(x))
Out[16]:
VendorName
DIAGEO NORTH AMERICA INC      67.99M
MARTIGNETTI COMPANIES         39.33M
PERNOD RICARD USA             32.06M
JIM BEAM BRANDS COMPANY       31.42M
BACARDI USA INC               24.85M
CONSTELLATION BRANDS INC      24.22M
E & J GALLO WINERY            18.40M
BROWN-FORMAN CORP             18.25M
ULTRA BEVERAGE COMPANY LLP    16.50M
M S WALKER INC                14.71M
Name: TotalSalesDollars, dtype: object
In [17]:
top_brands.apply(lambda x : format_dollar(x))
Out[17]:
Description
Jack Daniels No 7 Black    7.96M
Tito's Handmade Vodka      7.40M
Grey Goose Vodka           7.21M
Capt Morgan Spiced Rum     6.36M
Absolut 80 Proof           6.24M
Jameson Irish Whiskey      5.72M
Ketel One Vodka            5.07M
Baileys Irish Cream        4.15M
Kahlua                     3.60M
Tanqueray                  3.46M
Name: TotalSalesDollars, dtype: object
In [18]:
# Visualizing top 10 vendors and brands
# Create subplots
fig = make_subplots(
    rows=1, cols=2, )

# Add VendorName bar chart
fig.add_trace(
    go.Bar(x=top_vendors.index, y=top_vendors.values, name="VendorName"),
    row=1, col=1
)

# Add VendorName bar chart
fig.add_trace(
    go.Bar(x=top_brands.index, y=top_brands.values, name="VendorName"),
    row=1, col=2
)

fig.update_layout(height=600, width=1000, title_text="Top 10 Vendors and Brands by Total Sales")

fig.show()

Which vendor contribute the most to total purchase dollar¶

In [19]:
df.head()
Out[19]:
VendorNumber VendorName Brand Description PurchasePrice ActualPrice Volume TotalPurchaseQuantity TotalPurchaseDollars TotalSalesQuantity TotalSalesDollars TotalSalesPrice TotalExciseTax FreightCost GrossProfit ProfitMargin StockTurnover SalesPurchaseRatio
0 2 IRA GOLDMAN AND WILLIAMS, LLP 90085 Ch Lilian 09 Ladouys St Este 23.86 36.99 750.0 8 190.88 18.0 665.82 295.92 2.00 27.08 474.94 71.331591 2.250000 3.488160
1 60 ADAMBA IMPORTS INTL INC 771 Bak's Krupnik Honey Liqueur 11.44 14.99 750.0 39 446.16 47.0 704.53 494.67 37.01 367.52 258.37 36.672675 1.205128 1.579097
2 105 ALTAMAR BRANDS LLC 2529 Right Gin 23.25 29.99 750.0 12 279.00 12.0 359.88 59.98 9.44 62.39 80.88 22.474158 1.000000 1.289892
3 105 ALTAMAR BRANDS LLC 8412 Tequila Ocho Plata Fresno 35.71 49.99 750.0 320 11427.20 307.0 15346.93 12947.41 242.15 62.39 3919.73 25.540808 0.959375 1.343018
4 200 AMERICAN SPIRITS EXCHANGE 20789 Zin-phomaniac Znfdl 9.73 14.99 750.0 96 934.08 84.0 1511.16 287.84 9.43 6.19 577.08 38.187882 0.875000 1.617806
In [20]:
vendor_performance= df.groupby('VendorName').agg({
    'TotalSalesDollars': 'sum',
    'GrossProfit': 'sum',
    'TotalPurchaseDollars': 'sum'
}).reset_index()
In [21]:
vendor_performance['PurchaseContribution(%)'] = (vendor_performance['TotalPurchaseDollars'] / vendor_performance['TotalPurchaseDollars'].sum()) * 100
vendor_performance
Out[21]:
VendorName TotalSalesDollars GrossProfit TotalPurchaseDollars PurchaseContribution(%)
0 ADAMBA IMPORTS INTL INC 704.53 258.37 446.16 0.000145
1 ALISA CARR BEVERAGES 104470.94 78772.82 25698.12 0.008362
2 ALTAMAR BRANDS LLC 15706.81 4000.61 11706.20 0.003809
3 AMERICAN SPIRITS EXCHANGE 1511.16 577.08 934.08 0.000304
4 AMERICAN VINTAGE BEVERAGE 139603.53 35167.85 104435.68 0.033981
... ... ... ... ... ...
114 WEIN BAUER INC 56217.13 13522.49 42694.64 0.013892
115 WESTERN SPIRITS BEVERAGE CO 405254.83 106837.97 298416.86 0.097097
116 WILLIAM GRANT & SONS INC 7569876.20 1693337.94 5876538.26 1.912074
117 WINE GROUP INC 8304043.28 3100242.11 5203801.17 1.693183
118 ZORVINO VINEYARDS 124189.59 38066.88 86122.71 0.028022

119 rows × 5 columns

In [22]:
vendor_performance = round(vendor_performance.sort_values('PurchaseContribution(%)', ascending= False), 2)
In [23]:
top_vendors = vendor_performance.head(10)
top_vendors['TotalSalesDollars'] = top_vendors['TotalSalesDollars'].apply(lambda x : format_dollar(x))
top_vendors['GrossProfit'] = top_vendors['GrossProfit'].apply(lambda x : format_dollar(x))
top_vendors['TotalPurchaseDollars'] = top_vendors['TotalPurchaseDollars'].apply(lambda x : format_dollar(x))
top_vendors
Out[23]:
VendorName TotalSalesDollars GrossProfit TotalPurchaseDollars PurchaseContribution(%)
25 DIAGEO NORTH AMERICA INC 67.99M 17.89M 50.10M 16.30
57 MARTIGNETTI COMPANIES 39.33M 13.83M 25.50M 8.30
68 PERNOD RICARD USA 32.06M 8.21M 23.85M 7.76
46 JIM BEAM BRANDS COMPANY 31.42M 7.93M 23.49M 7.64
6 BACARDI USA INC 24.85M 7.42M 17.43M 5.67
20 CONSTELLATION BRANDS INC 24.22M 8.95M 15.27M 4.97
11 BROWN-FORMAN CORP 18.25M 5.01M 13.24M 4.31
30 E & J GALLO WINERY 18.40M 6.33M 12.07M 3.93
106 ULTRA BEVERAGE COMPANY LLP 16.50M 5.34M 11.17M 3.63
53 M S WALKER INC 14.71M 4.94M 9.76M 3.18
In [24]:
top_vendors['Cumulitive Purchase(%)'] = top_vendors['PurchaseContribution(%)'].cumsum()
top_vendors
Out[24]:
VendorName TotalSalesDollars GrossProfit TotalPurchaseDollars PurchaseContribution(%) Cumulitive Purchase(%)
25 DIAGEO NORTH AMERICA INC 67.99M 17.89M 50.10M 16.30 16.30
57 MARTIGNETTI COMPANIES 39.33M 13.83M 25.50M 8.30 24.60
68 PERNOD RICARD USA 32.06M 8.21M 23.85M 7.76 32.36
46 JIM BEAM BRANDS COMPANY 31.42M 7.93M 23.49M 7.64 40.00
6 BACARDI USA INC 24.85M 7.42M 17.43M 5.67 45.67
20 CONSTELLATION BRANDS INC 24.22M 8.95M 15.27M 4.97 50.64
11 BROWN-FORMAN CORP 18.25M 5.01M 13.24M 4.31 54.95
30 E & J GALLO WINERY 18.40M 6.33M 12.07M 3.93 58.88
106 ULTRA BEVERAGE COMPANY LLP 16.50M 5.34M 11.17M 3.63 62.51
53 M S WALKER INC 14.71M 4.94M 9.76M 3.18 65.69
Here we can see from the Cumulitive Purchase column that top 10 vendora are contributiing more than 65 % in TotalPurchaseDollars¶
In [25]:
figure_title= "Top 10 Vendors by Purchase Contribution"
fig = px.bar(   
    top_vendors, 
    x='VendorName', 
    y='PurchaseContribution(%)',
    text='PurchaseContribution(%)',
    title= figure_title,
    color='PurchaseContribution(%)',
    color_continuous_scale= 'RdBu_r'
)
fig.update_layout(height=600, width=800, coloraxis_showscale=False)
fig.show()  
How much total procurement is dependent on the top vendors¶
In [26]:
print(f'Total purchase contribution of Top 10 veendors is {round(top_vendors['PurchaseContribution(%)'].sum(),2)} %')
Total purchase contribution of Top 10 veendors is 65.69 %
In [27]:
vendors= list(top_vendors['VendorName'].values)
purchase_contribution= list(top_vendors['PurchaseContribution(%)'].values)
total_contribution= sum(purchase_contribution)
remaining_contribution= 100 - total_contribution

# Append other vendors category
vendors.append('Other Vendors')
purchase_contribution.append(remaining_contribution)
fig = px.pie(
    names=vendors, 
    values=purchase_contribution, 
    title='Purchase Contribution Distribution',
    color_discrete_sequence=px.colors.sequential.RdBu
)
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.update_layout(height=600, width=800)
fig.show()

Does purchasing in bulk reduce the unit price and waht is the optimal purchase volume for cost saving.¶

In [28]:
df['UnitPurchasePrice'] = df['TotalPurchaseDollars'] / df['TotalPurchaseQuantity']
df['OrderSize']= pd.qcut(df['TotalPurchaseQuantity'], q=4, labels=['Small', 'Medium', 'Large', 'Very Large'])
In [29]:
df.groupby('OrderSize')['UnitPurchasePrice'].mean().reset_index()
Out[29]:
OrderSize UnitPurchasePrice
0 Small 46.677651
1 Medium 15.858090
2 Large 14.098238
3 Very Large 10.172097
Higher Order Size ------> Lower unit Purchase Price¶
In [30]:
# Ploting the effect of order size on unit purchase price
# Higher Order Size  ------>  Lower unit Purchase Price
fig = px.box(
    df,
    x="OrderSize",
    y="UnitPurchasePrice",
    color="OrderSize",
    title="Effect of Order Size on Unit Purchase Price"
)

fig.update_layout(height=600, width=800, showlegend=False)
fig.show()
Effect of Odrder size in Unit purchase price¶
  • Vendor buying in bulk (Very Large - OrderSize) gets the lowest unit price($10.172097 per unit), meaning higher mergin if they can manage inventory efficiently.
  • The price difference between Small and Medium Order Size is substantial. (~ 66.04% reduction in unit cost)
  • This suggests that bulk pricing strategies successfully encourage Vendors to purchase in large volumes, leading to higher average sales despite lower per unit revenue.

Which vendors have low stock turnover, indicating excess stock and slow moving products ?¶

In [31]:
df[df['StockTurnover']< 1].groupby('VendorName')[['StockTurnover']].mean().reset_index().sort_values('StockTurnover', ascending= True).head(10)
Out[31]:
VendorName StockTurnover
0 ALISA CARR BEVERAGES 0.615385
36 HIGHLAND WINE MERCHANTS LLC 0.708333
60 PARK STREET IMPORTS LLC 0.751306
19 Circa Wines 0.755676
26 Dunn Wine Brokers 0.766022
15 CENTEUR IMPORTS LLC 0.773953
78 SMOKY QUARTZ DISTILLERY LLC 0.783835
90 TAMWORTH DISTILLING 0.797078
91 THE IMPORTED GRAPE LLC 0.807569
101 WALPOLE MTN VIEW WINERY 0.820548
How much stock is locked in unsold inventory per vendor and which vendors contribute the most to it ?¶
In [32]:
df['TotalUnsoldInventoryValue'] = (df['TotalPurchaseQuantity'] - df['TotalSalesQuantity']) * df['PurchasePrice']
print(f'{format_dollar(df['TotalUnsoldInventoryValue'].sum())} is the total value of unsold inventory across all vendors.')
2.71M is the total value of unsold inventory across all vendors.
In [33]:
# Aggregate capital locked in unsold inventory per vendor
inventory_velue_per_vendor = df.groupby('VendorName')['TotalUnsoldInventoryValue'].sum().reset_index()

# Sort Vendors with highest unsold loked inventory value
inventory_velue_per_vendor = inventory_velue_per_vendor.sort_values(by='TotalUnsoldInventoryValue', ascending=False)
inventory_velue_per_vendor['TotalUnsoldInventoryValue'] = inventory_velue_per_vendor ['TotalUnsoldInventoryValue'].apply(format_dollar)
inventory_velue_per_vendor.head(10)
Out[33]:
VendorName TotalUnsoldInventoryValue
25 DIAGEO NORTH AMERICA INC 722.21K
46 JIM BEAM BRANDS COMPANY 554.67K
68 PERNOD RICARD USA 470.63K
116 WILLIAM GRANT & SONS INC 401.96K
30 E & J GALLO WINERY 228.28K
79 SAZERAC CO INC 198.44K
11 BROWN-FORMAN CORP 177.73K
20 CONSTELLATION BRANDS INC 133.62K
61 MOET HENNESSY USA INC 126.48K
77 REMY COINTREAU USA INC 118.60K
What is the 95% confidence intervals for profit margins of top-performing vendors ?¶
In [34]:
top_threshold = df['TotalSalesDollars'].quantile(0.75)
low_threshold = df['TotalSalesDollars'].quantile(0.25)
In [35]:
top_vendors = df[df['TotalSalesDollars'] >= top_threshold]['ProfitMargin'].dropna()
low_vendors = df[df['TotalSalesDollars'] <= low_threshold]['ProfitMargin'].dropna()
In [36]:
def confidence_interval(data, confidence=0.95):
    n = len(data)
    mean = np.mean(data)
    sem = stats.sem(data)
    margin_err = sem * stats.t.ppf((1 + confidence) / 2., n-1)
    return mean, mean - margin_err, mean + margin_err         
In [37]:
top_mean, top_lower, top_upper = confidence_interval(top_vendors)
low_mean, low_lower, low_upper = confidence_interval(low_vendors)
In [38]:
print(f"Top Vendors Profit Margin: Mean={top_mean:.2f}, 95% CI=({top_lower:.2f}, {top_upper:.2f})")
print(f"Low Vendors Profit Margin: Mean={low_mean:.2f}, 95% CI=({low_lower:.2f}, {low_upper:.2f})")

fig = go.Figure()

# KDE for top vendors
top_kde = stats.gaussian_kde(top_vendors)
x_top = np.linspace(top_vendors.min(), top_vendors.max(), 200)
fig.add_trace(go.Scatter(
    x=x_top, y=top_kde(x_top),
    fill='tozeroy', name='Top Vendors', line=dict(color='blue')
))

# KDE for low vendors
low_kde = stats.gaussian_kde(low_vendors)
x_low = np.linspace(low_vendors.min(), low_vendors.max(), 200)
fig.add_trace(go.Scatter(
    x=x_low, y=low_kde(x_low),
    fill='tozeroy', name='Low Vendors', line=dict(color='orange')
))

# Mean lines
fig.add_vline(x=top_mean, line_dash="dash", line_color="blue", annotation_text="Top Mean", annotation_position="top left")
fig.add_vline(x=low_mean, line_dash="dash", line_color="orange", annotation_text="Low Mean", annotation_position="top right")

# 95% CI shaded regions
fig.add_shape(type="rect", x0=top_lower, x1=top_upper, y0=0, y1=max(top_kde(x_top).max(), low_kde(x_low).max()),
              fillcolor="blue", opacity=0.15, line_width=0, layer="below")
fig.add_shape(type="rect", x0=low_lower, x1=low_upper, y0=0, y1=max(top_kde(x_top).max(), low_kde(x_low).max()),
              fillcolor="orange", opacity=0.15, line_width=0, layer="below")

fig.update_layout(
    title="Profit Margin Distribution: Top vs Low Vendors",
    xaxis_title="Profit Margin",
    yaxis_title="Density",
    width=900, height=500
)
fig.show()
Top Vendors Profit Margin: Mean=31.18, 95% CI=(30.74, 31.61)
Low Vendors Profit Margin: Mean=41.57, 95% CI=(40.50, 42.64)
  • The coonfidance interval for low performing vendors (40.48% - 42.62%) is significantly higher than that of top-performing vnedors (30.74% - 31.61).
  • This suggestis that vendors with lower sales tend to maintain higher profir margins, potentially due to premium-pricing or lower operational-cost.
  • For High-Performing Vendors: If they aim to improve profitibily, they could explore selective pricing adjustment, cost optimization or bundling strategies.
  • For Low-Performing Vendors: Desoite their high profit margins, their low sales volume indicate a need for marketing, competitive pricing or improved distribution strategies.

Hypothesis testing¶

is there a significant difference in profit margins between top performing and low performing vendors?¶

  • $H_0$ (Null Hypothesis): There is no significant difference in the mean profit margins and top-performing & low0performing vendors.
  • $H_1$ (Alternative Hypothesis): The mean profit margins for top-performing & low-performing vendors are significantly different.
In [39]:
top_threshold = df['TotalSalesDollars'].quantile(0.75)
low_threshold = df['TotalSalesDollars'].quantile(0.25)

top_vendors = df[df['TotalSalesDollars'] >= top_threshold]['ProfitMargin'].dropna()
low_vendors = df[df['TotalSalesDollars'] <= low_threshold]['ProfitMargin'].dropna()

# Perform sample t-test
t_stat, p_value = ttest_ind(top_vendors, low_vendors, equal_var=False)

# Print results
print(f"T-statistic: {t_stat:.4f}, P-value: {p_value:.4f}")
if p_value < 0.05:
    print("Reject the null hypothesis: There is a Significant difference in profit margins.")
else:
    print("Fail to reject the null hypothesis: There is No significant difference in profit margins.")
T-statistic: -17.6695, P-value: 0.0000
Reject the null hypothesis: There is a Significant difference in profit margins.
In [ ]: